Skip to content

Upgrade transformers==5.3.0#17784

Merged
Fridge003 merged 91 commits intosgl-project:mainfrom
JustinTong0323:update-transformers-v5
Mar 18, 2026
Merged

Upgrade transformers==5.3.0#17784
Fridge003 merged 91 commits intosgl-project:mainfrom
JustinTong0323:update-transformers-v5

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 commented Jan 26, 2026

Motivation

Address #17779 — Upgrade transformers to 5.3.0.

Changes

  • Bump transformers>=5.2.0, huggingface_hub>=1.0.0; remove hf_transfer
  • get_rope_config() utility for backward-compatible config.rope_parameters access
  • Qwen2: remove padding_idx (transformers#41541)
  • Gemma3: adapt for v5 API changes
  • LLaVA: handle CLIPImageProcessorFast returning torch.Tensor instead of ndarray
  • Qwen2.5-VL encoder: use pooler_output instead of last_hidden_state
  • Tokenizer: explicitly use_fast=True in auto mode; fix special_tokens_pattern; sync text_config
  • Config: register custom configs with AutoConfig; GGUF version parsing workaround
  • Tests: fix _apply_rotary_emb import path; fix Qwen2.5-VL .visual.model.visual
  • InternVL test: patch meta-tensor .item() and missing all_tied_weights_keys for v5 compat

TODO

  • Rope parameter handling (config.rope_parameters)
  • Qwen2 padding_idx removal
  • Gemma3 v5 adaptation
  • LLaVA CLIPImageProcessorFast tensor handling
  • Qwen2.5-VL encoder pooler_output
  • Tokenizer use_fast=True default, special_tokens_pattern fix
  • GGUF InvalidVersion: 'N/A' workaround
  • Test import path fix (_apply_rotary_emb)
  • Test fix: Qwen2.5-VL .visual moved to .model.visual
  • InternVL test: v5 meta-tensor init crashes torch.linspace().item() + missing all_tied_weights_keys
  • clean_up_tokenization removed in v5 — InternVL's HF Hub tokenizer (trust_remote_code) still calls it; TOKENIZER_MAPPING.register is bypassed by auto_map
  • Kimi-VL: is_torch_fx_available removed — upstream model code (moonshotai) or sglang shim
  • fp8 quantization incompatible with diffusers — upstream
  • Embedding model crash — SRT engine forward_batch_embedding fails with batch.input_ids=None (TypeError: object of type 'NoneType' has no len() in ForwardBatch.init_new)
  • MiniCPM-o-2_6: v5 AutoProcessor fails — ValueError: Unrecognized feature extractor; v5 can't resolve feature extractor for MiniCPM-o model type
  • MiniCPM-V-4: model sees "text/slice" instead of images — vision embeddings not working correctly (needs investigation)
  • InternVL test: model loads but inference output is wrong (describes both images as SGL logos)
  • DeepSeek-OCR: missing addict, matplotlib packages in CI (not v5-related)
  • DeepSeek-OCR: is_deepseek_nsa() crashes on dict hf_text_configAttributeError: 'dict' object has no attribute 'architectures'
  • Embedding test_matryoshka_embedding: v5 respects config.is_causal=false → bidirectional attention in HF reference, but SGLang always uses causal
  • InternVL2.5-8B piecewise cuda graph: MGSM accuracy drops to ~0.36 (v5-specific; individual prompts correct, fails under concurrent eval load)

…ple files

- Updated `huggingface_hub` dependency to version `>=1.0.0` in `pyproject_cpu.toml`, `pyproject_npu.toml`, `pyproject_other.toml`, `pyproject_xpu.toml`, and `pyproject.toml`.
- Upgraded `transformers` dependency to version `5.0.0` in the same files.
- Removed `hf_transfer` from the dependencies in the aforementioned files.
- Refactored the handling of `rope_theta` and `rope_scaling` parameters to use `config.rope_parameters` in various model files for consistency and improved maintainability.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@github-actions github-actions bot added quant LLM Quantization dependencies Pull requests that update a dependency file deepseek npu diffusion SGLang Diffusion labels Jan 26, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @JustinTong0323, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on upgrading core dependencies, most notably the transformers library to version 5.0.0 and huggingface_hub to >=1.0.0. These updates necessitate significant refactoring across various model implementations to align with changes in how Rotary Positional Embedding (RoPE) parameters (rope_theta and rope_scaling) are accessed, moving towards a more unified config.rope_parameters approach. Additionally, the hf_transfer dependency has been removed, and test runners have been adapted to new API changes in the transformers library for multimodal models. The overall impact is enhanced compatibility with the latest Hugging Face ecosystem and improved code consistency.

Highlights

  • Dependency Upgrades: The transformers library has been updated to version 5.0.0, and huggingface_hub to >=1.0.0 across all pyproject.toml configurations.
  • hf_transfer Removal: The hf_transfer dependency and its related activation logic have been removed from project configurations and utility files, streamlining dependencies.
  • RoPE Parameter Refactoring: The handling of rope_theta and rope_scaling parameters has been refactored across numerous model implementations to consistently use config.rope_parameters.get() for improved maintainability and compatibility with the updated transformers library.
  • Test Runner Adaptations: Model loading and feature extraction logic in sglang/test/runners.py have been updated to reflect changes in the transformers library, specifically replacing AutoModelForVision2Seq with AutoModelForImageTextToText and adjusting feature extraction calls to use return_dict=True and pooler_output.
  • Tokenizer Usage Update: A minor adjustment was made in test/registered/core/test_score_api.py to use tokenizer() directly instead of tokenizer.encode_plus().

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on updating the transformers library to version 5.0.0 and adapting the codebase to changes introduced in this new version. Key modifications include updating huggingface_hub and transformers dependencies, removing the hf_transfer dependency and its associated code, and refactoring the access pattern for rope_theta and rope_scaling parameters across various model files to use config.rope_parameters.get(...). Additionally, transformers API calls in test files have been updated to reflect changes in class names and output access methods. These changes are well-justified and necessary for compatibility with the upgraded transformers library, improving overall code maintainability and consistency.

…les for cleaner dependency management.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

JustinTong0323 commented Jan 26, 2026

/tag-and-rerun-ci run again again

@tugot17
Copy link
Copy Markdown
Contributor

tugot17 commented Jan 29, 2026

The tokenizer_manager.py also will have to be changed right?

# Wrap each token ID in its own list for batch_decode to decode them separately
 # batch_decode([1, 2, 3]) concatenates tokens, batch_decode([[1], [2], [3]]) decodes separately
token_texts = self.tokenizer.batch_decode([[idx] for idx in token_logprobs_idx])

at least I had to change this to make it run with sglang after manually upgrading

@vincentzed vincentzed mentioned this pull request Jan 29, 2026
5 tasks
Fix tokenizer behavior in auto mode to ensure compatibility with Transformers v5 by explicitly setting use_fast=True when not provided.
@alisonshao
Copy link
Copy Markdown
Collaborator

stage-b-test-small-1-gpu (5) test_embedding_models.py passed on main

CI Status Update (Run 23168742145 @ 17888cc)

After merging latest main (including #20715 sglang-kernel CI fix):

✅ All stage-b tests pass (except known pre-existing issue)

  • stage-b-test-small-1-gpu: 7/8 passed — shard 5 fails on Qwen3-Embedding numerical precision on Blackwell (pre-existing, same on main)
  • stage-b-test-large-1-gpu: 14/14 passed (shard 10 passed on rerun — port conflict flake)
  • stage-b-test-large-2-gpu: 4/4 passed
  • stage-b-test-4-gpu-b200: passed
  • DSV3 accuracy test (MLA + torch compile + FA3 + MTP) all pass

❌ Blocked by pre-existing issues

  • stage-c all skipped — blocked by wait-for-stage-b gate failing due to shard 5
  • multimodal-gen-test — FLUX.2-dev 401 auth (gated model access)

The only blocking failure (stage-b-test-small-1-gpu (5) Qwen3-Embedding) is a known pre-existing Blackwell issue that also fails on main. Could a maintainer bypass this gate to run stage-c tests?

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

JustinTong0323 commented Mar 17, 2026

stage-b-test-small-1-gpu (5) test_embedding_models.py passed on main

…asses

The _fix_v5_add_bos_eos_token function was blindly restoring add_eos_token
from tokenizer_config.json for all models, but Qwen2Tokenizer did not
support this flag in v4. This caused gte-Qwen2-1.5B-instruct to add an
unexpected EOS token, breaking embedding similarity tests.

Changes:
- Only restore BOS/EOS flags for tokenizer classes that supported them in
  v4 (LlamaTokenizer, GemmaTokenizer, etc.)
- Use v4 defaults (add_bos_token=True) when config value is null/missing
  to prevent update_post_processor() from dropping BOS
- Apply the same tokenizer fix to sentence-transformers in the HF test
  runner so HF reference and SRT produce matching tokens
…duce dead code

- Extract compute_mla_mscale_scaling() to replace 4 copy-pasted rope
  scaling blocks in model_config.py and deepseek_v2.py
- Extract _resolve_local_or_cached_file() to deduplicate local-path-then
  -hf_hub_download pattern across 3 tokenizer/processor fix functions
- Extract ensure_numpy() in mm_utils.py to deduplicate torch.Tensor to
  numpy conversion in mm_utils.py and llava.py
- Restructure get_hf_text_config() as proper elif chain with documented
  priority (thinker > llm > language > text), fixing assert that fired
  on text_config even when llm_config would override it
- Add thinker_config to early dict-to-PretrainedConfig conversion loop
- Remove dead dict-conversion branch in _patch_text_config (already
  handled by early loop in get_hf_text_config)
- Use from_dict instead of from_pretrained in get_config KeyError
  handler to avoid redundant disk read
- Add warning log for AutoImageProcessor failures in
  _build_processor_manually instead of silent swallow
- Fix operator precedence in is_deepseek_nsa: extract index_topk to
  separate variable for clarity
- Fix llama_eagle3 wrong nested key: rope_parameters["rope_type"]
  instead of rope_parameters["rope_scaling"]["rope_type"]
- Fix midashenglm same nested key bug: directly delete mrope_section
  from rope_parameters instead of writing to nonexistent nested key
- Fix gemma3n_causal _tied_weights_keys: use v5 dict format
  {target: source} instead of v4 list format
- Add debug logging to broad except clauses in hf_transformers_utils
- Narrow _ensure_llama_flash_attention2_compat exception to ImportError
- Improve batch_decode comment in tokenizer_manager
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@nvpohanh
Copy link
Copy Markdown
Collaborator

@JustinTong0323 Please let me know if you need help with fixing the issues after transformers version upgrade. Thanks!

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

@JustinTong0323 Please let me know if you need help with fixing the issues after transformers version upgrade. Thanks!

Thanks! I think most of the work is done and we only need to pass the CI now.

@Fridge003 Fridge003 merged commit d1e95af into sgl-project:main Mar 18, 2026
374 of 413 checks passed
Talantan1102 pushed a commit to randgun/sglang that referenced this pull request Mar 19, 2026
michaelzhang-ai added a commit to michaelzhang-ai/sglang that referenced this pull request Mar 20, 2026
…ccuracy threshold

PR sgl-project#17784 (transformers 5.3.0 upgrade) changed grok.py to access
config.rope_parameters["rope_theta"] directly, but GitConfig (grok-2)
does not have this attribute, crashing the server on startup with
AttributeError: 'GitConfig' object has no attribute 'rope_parameters'.

Restore safe access via getattr with fallback, matching the pattern
used elsewhere in the codebase.

Also lower the MI325 Grok-2 GSM8K accuracy threshold from 0.915 to
0.90 to match the MI35x test, since nightly sgl-project#636 showed 0.910 which
is within normal run-to-run variance.
@guapisolo
Copy link
Copy Markdown
Contributor

Great job.

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
@yudian0504
Copy link
Copy Markdown
Contributor

K2.5 failed with new transformers==5.3.0:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.10/site-packages/sglang/launch_server.py", line 68, in <module>
    run_server(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/launch_server.py", line 52, in run_server
    launch_server(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/http_server.py", line 2235, in launch_server
    Engine._launch_subprocesses(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 681, in _launch_subprocesses
    tokenizer_manager, template_manager = init_tokenizer_manager_func(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 131, in init_tokenizer_manager
    tokenizer_manager = TokenizerManagerClass(server_args, port_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 269, in __init__
    self.init_tokenizer_and_processor()
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 335, in init_tokenizer_and_processor
    _processor = _get_processor_wrapper(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 3015, in _get_processor_wrapper
    processor = get_processor(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/utils/hf_transformers_utils.py", line 1180, in get_processor
    processor = AutoProcessor.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 407, in from_pretrained
    return processor_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1403, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1517, in _get_arguments_from_pretrained
    tokenizer = cls._load_tokenizer_from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1464, in _load_tokenizer_from_pretrained
    tokenizer = auto_processor_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 732, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 583, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/opt/conda/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 309, in get_class_in_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_kimi.py", line 11, in <module>
    from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode
ImportError: cannot import name 'bytes_to_unicode' from 'transformers.models.gpt2.tokenization_gpt2' (/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/tokenization_gpt2.py)

@yudian0504
Copy link
Copy Markdown
Contributor

K2.5 failed with new transformers==5.3.0:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.10/site-packages/sglang/launch_server.py", line 68, in <module>
    run_server(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/launch_server.py", line 52, in run_server
    launch_server(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/http_server.py", line 2235, in launch_server
    Engine._launch_subprocesses(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 681, in _launch_subprocesses
    tokenizer_manager, template_manager = init_tokenizer_manager_func(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 131, in init_tokenizer_manager
    tokenizer_manager = TokenizerManagerClass(server_args, port_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 269, in __init__
    self.init_tokenizer_and_processor()
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 335, in init_tokenizer_and_processor
    _processor = _get_processor_wrapper(server_args)
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 3015, in _get_processor_wrapper
    processor = get_processor(
  File "/opt/conda/lib/python3.10/site-packages/sglang/srt/utils/hf_transformers_utils.py", line 1180, in get_processor
    processor = AutoProcessor.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 407, in from_pretrained
    return processor_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1403, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1517, in _get_arguments_from_pretrained
    tokenizer = cls._load_tokenizer_from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 1464, in _load_tokenizer_from_pretrained
    tokenizer = auto_processor_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 732, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 583, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/opt/conda/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 309, in get_class_in_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_kimi.py", line 11, in <module>
    from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode
ImportError: cannot import name 'bytes_to_unicode' from 'transformers.models.gpt2.tokenization_gpt2' (/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/tokenization_gpt2.py)

update: solved by updating https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/tokenization_kimi.py

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek dependencies Pull requests that update a dependency file diffusion SGLang Diffusion documentation Improvements or additions to documentation high priority Multi-modal multi-modal language model npu quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.